Journal of Forestry Research (2008) 19(4):293—297 
DOI 10.1007/s 11676-008-0052-1 


RESEARCH PAPER 


Analysis of synonymous codon usage in chloroplast genome of 

Populus alba 


ZHOU Meng LONG Wei 2 , LI Xia '* * 

1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, P. R. China 

2 fT-7 

The Key Laboratory of Forestry Genetics and Engineering of State Forestry Administration & Jiangsu Provincial, Nanjing Forestry University, 

Nanjing 210037, P. R. China 

Abstract: The pattern of codon usage in the chloroplast genome of Populus alba was investigated. Correspondence analysis (a commonly 
used multivariate statistical approach) and method of effective number of codons (ENc)-plot were conducted to analyze synonymous codon 
usage. The results of correspondence analysis showed that the distribution of genes on the major axis was significantly correlated with the 
frequency of use of G+C in synonymously variable third position of sense codon (GC 3S ), (r=0.349), and the positions of genes on the axis 2 
and axis 3 were significantly correlated with CAI (r=-0.348, /?<0.01 and r=0.602, /?<0.01). The ENc for most genes was similar to that for 
the expected ENc based on the GC 3 s, but several genes with low EN C values were lying below the expected curve. All of these data indi¬ 
cated that codon usage was dominated by a mutational bias in chloroplast genome of P. alba. The selection in nature for translational effi¬ 
ciency only played a minor role in shaping codon usage in the chloroplast genome of P. alba. 
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Introduction 

Sixty-four codons are found in the universal genetic code, which 
encode 20 different amino acids in the organism world. Owing to 
the degeneracy of the genetic code, each amino acid may be 
coded by two or more codons (synonymous codons). 
Non-random codon usage, or codon bias is a common phe¬ 
nomenon in a wide variety of organisms, including prokaryotes, 
animals and plants (Akashi 2001; Duret 2002; Bonitz, et al. 
1980). Synonymous codon usage varies widely between ge¬ 
nomes and also between genes within genomes (Wang 2007). 
The variations in interspecific codon usage and intragenomic 
codon usage are primarily due to directional mutation pressure 
on DNA sequences and natural selection affecting gene transla¬ 
tion (Lu et al. 2005). Several authors previously suggested that 
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the patterns of codon usage had shown different features between 
monocot and dicot species in nuclear genomes (Kawabe et al. 
2003; Wang et al. 2007). The numerous analysis reports on syn¬ 
onymous codon usage bias have been mainly focused on nuclear 
genomes. However, only few mentions have been made of this 
analysis on organelles (Morton 1998, 1999, 2003; Morton et al. 
2000). In the plant chloroplast genomes, selection in nature is 
found to be weaker in Pinus thunbergii, but there is evidence that 
an intermediate level of selection exists in the liverwort ( Mar- 
chantia polymorpha ), (Morton 1998). In the present study, we 
examined the pattern of synonymous codon usage in the chloro¬ 
plast genome of Populus alba. The main purpose of this study is 
to investigate the codon usage pattern in this organelle. 

Materials and methods 

Sequence data 

The complete chloroplast genome sequence from P. alba 
(NC 008235) was obtained from GenBank. Using the informa¬ 
tion in the Genbank file, all protein coding, ORF, and ycf (Hal- 
lick et al. 1994) or conserved open reading frames (Hallick et al. 
1994) sequences greater than 300 nucleotides in length, were 
extracted directly to avoid sampling bias in codon usage calcula¬ 
tions (Wright 1990). A total of 57 genes were combined for 
codon usage analysis. 

Indices of codon usage 

The effective number of codons (ENc) used in a gene is a simple 
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measure of codon bias (Wright 1990). This is a measure of no 
uniformity of usage within synonymous groups of codons. EN C 
values can vary from 20 (extreme bias where only one codon is 
used per amino acid) to 61 (random codon usage). Relative Syn¬ 
onymous Codon Usage (RSCU) is the observed frequency of a 
codon divided by the frequency expected if there is uniform us¬ 
age within synonymous codon groups (Sharp et al. 1986). If all 
synonymous codons coding the same mino acid were used 
equally, RSCU values were close to 1.0, indicating a lack of bias. 
The index of GC 3S is the frequency of use of G+C in synony¬ 
mously variable third positions of sense codon (i.e., excluding 
Met, Trp and termination codons). The “Codon Adaptation In¬ 
dex” (CAI) uses a reference set of highly expressed genes from a 
species to assess the relative merits of each codon, and a score 
for a gene is calculated from the frequency of use of all codons 
in that gene. The index assesses the extent to which selection has 
been effective in molding the pattern of codon usage. In that 
respect, CAI is useful for predicting the level of expression of a 
gene (Sharp et al. 1987). The CAI value for every gene was cal¬ 
culated relative to the psbA gene of the same genome (Moton 
1998). These indices of codon usage bias were calculated for 
each gene in the data set using the program CodonW (version 
1.4.2, http://codonw.sourceforge.net/ ). 

Correspondence analysis 

Correspondence analysis (COA) has become the method of 
choice for multivariate statistical analysis of codon usage pat¬ 
terns (Grantham et al. 1980; Shields et al. 1987; Sharp et al. 
1989). Since there are a total of 59 synonymous codons (61 sense 
codons, less the unique methionine and tryptophan codons), this 
analysis partitions the variation along 59 orthogonal axes, with 
41 degrees of freedom. The first axis is the one that captures 
most of the variation in codon usage, with each subsequent axis 


explaining a diminishing amount of the variance. In contrast to 
other types of variance component analysis, such as Principal 
Component Analysis (PCA), correspondence analysis has the 
advantage of not only to show the distribution of genes in the 
multidimensional space, but also to show the corresponding dis¬ 
tribution of synonymous codons. Correspondence analysis was 
primarily designed for use with data tables containing counts, 
e.g., numbers of synonymous codons, whereas PCA is a general 
method of data reduction that is more suitable for continuous 
measurement data (Perriere et al. 2002). Correspondence analy¬ 
sis of RSCU was also performed using CodonW. 

Statistical analysis 

All correlations used are based on the nonparametric Spearman's 
rank correlation analysis method wrapped in the multi-analysis 
software SPSS Version 12.0. By using this measure of associa¬ 
tion, it is not essential to make any distributional assumptions of 
the underlying data. 

Results 

The codon usage for 57 chloroplast genes from Populus alba is 
presented in Table 1. There is a general excess of A- and 
U-ending codons. For every amino acid, an A- and U-ending 
codon is available. Terminal codon prefers to use UAA more 
than others often. For those amino acids, there is substantial (and 
statistically significant) no uniformity in synonymous codon 
usage (most easily seen by examination of the RSCU values). 
This reflects a mutational bias towards A+T, which in the ab¬ 
sence of other selection pressures would be expected to increase 
the RSCU values for synonymous A+U ending codons to greater 
than 1. 


Table 1. Summary of codon usage in the chloroplast genome of Populus alba 


Amino 

acid 

Codon 

N 

RSCU 

Amino 

acid 

Codon 

N 

RSCU 

Amino 

acid 

Codon 

N 

RSCU 

Amino 

acid 

Codon 

N 

RSCU 

Phe 

UUU 

892 

1.31 

Ser 

UCU 

520 

1.67 

Tyr 

UAU 

725 

1.65 

Cys 

UGU 

190 

1.39 


UUC 

468 

0.69 


UCC 

313 

1.01 


UAC 

154 

0.35 


UGC 

83 

0.61 

Leu 

UUA 

793 

1.87 


UCA 

373 

1.20 

TER 

UAA 

31 

1.63 

TER 

UGA 

13 

0.68 


UUG 

518 

1.22 


UCG 

172 

0.55 


UAG 

13 

0.68 

Trp 

UGG 

428 

1.00 


CUU 

542 

1.28 

Pro 

ecu 

385 

1.57 

His 

CAU 

444 

1.50 

Arg 

CGU 

306 

1.27 


cue 

156 

0.37 


CCC 

187 

0.76 


CAC 

148 

0.50 


CGC 

106 

0.44 


CUA 

366 

0.86 


CCA 

277 

1.13 

Gin 

CAA 

668 

1.54 


CGA 

326 

1.36 


CUG 

167 

0.39 


CCG 

131 

0.53 


CAG 

199 

0.46 


CGG 

91 

0.38 

He 

AUU 

1026 

1.48 

Thr 

ACU 

459 

1.55 

Asn 

AAU 

925 

1.54 

Ser 

AGU 

374 

1.20 


AUC 

397 

0.57 


ACC 

224 

0.76 


AAC 

276 

0.46 


AGC 

112 

0.36 


AUA 

650 

0.94 


ACA 

381 

1.29 

Lys 

AAA 

986 

1.49 

Arg 

AGA 

451 

1.88 

Met 

AUG 

544 

1.00 


ACG 

118 

0.40 


AAG 

340 

0.51 


AGG 

161 

0.67 

Val 

GUU 

447 

1.42 

Ala 

GCU 

567 

1.82 

Asp 

GAU 

792 

1.58 

Gly 

GGU 

516 

1.26 


GUC 

155 

0.49 


GCC 

197 

0.63 


GAC 

210 

0.42 


GGC 

178 

0.43 


GUA 

472 

1.50 


GCA 

348 

1.12 

Glu 

GAA 

963 

1.49 


GGA 

657 

1.60 


GUG 

187 

0.59 


GCG 

136 

0.44 


GAG 

330 

0.51 


GGG 

289 

0.70 


Notes: The frequency of codons in the 57 chloroplast genes of P. alba were summed and used to determine the set of most frequently used codons. N—sum of the 
frequencies of the codons; RSCU—relative synonymous codon usage. 
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In most species examined, there is considerable heterogeneity 
of codon usage among genes (Ikemura 1985; Sharp et al. 1988), 
and so it is essential to look for any trends among genes using 
multivariate techniques. The correspondence analysis of relative 
synonymous codon usage (RSCU) is conducted and generates a 
series of orthogonal axes that reflect the trends responsible for 
the variation in codon usage. The first axis accounts for 10.37% 
and other three axes account for 8.80%, 8.07% and 6.94% of the 
total variation in the dataset, with each subsequent axis explain¬ 
ing a decreasing amount of the variation. Variation of 10.37% 
was not remarkably high for relative inertia explained by the first 
axis. A projection of each gene on the first two CO A axes is 
presented in Fig. 1. The origin represents the average RSCU for 
all genes, with respect to the two axes. The distance between 


genes on this plot is a reflection of their dissimilarity in RSCU, 
with respect to the first two axes. The corresponding distribution 
of synonymous codons (Fig. IB) shows the separation of 
C/G-ending codons and A/U- ending codons along the primary 
axis. This indicated that the variations in synonymous codon 
usage among the P. alba genes were based on the nucleotide 
content of the genes. The separation of genes on the second axis 
appears to be largely related to the level of expression. Many 
genes that have been known or expected to be expressed at high 
levels in plant chloroplast genomes such as psaA, psaB , psbB , 
psbC , psbD , atpA , atpB and rbcL and Pet genes (Klein et al. 
1986; Mullet et al. 1987; Morton 1998), are located towards one 
extreme of axis 2, while others lie at the other extreme of the 
second axis. 
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Fig. 1 Correspondence analysis of the relative synonymous codon usage in 57 genes from chloroplast genome of Populus alba. (A) The distribu¬ 
tion of genes on the plane defined the first two main axes. (B) The distribution of synonymous codons along the first and the second axes of the corre¬ 
spondence analysis 


To identify the factor that resulted in the dispersion of genes, 
the ordination of genes on the first four COA axes was examined 
for correlations with indices of codon usage and amino acid 
composition (e.g. ENc, GC 3S , GC, GRAVY and Aromaticity). A 
summary of these correlations is presented in Table 2. The dis¬ 
tribution of genes on primary axis was significantly correlated 
with GC 3S (p<0.01), but was not significantly correlated with 
CAI (r=0.024). We can conclude that this demonstration of a 
strong correlation between GC 3S and codon usage suggests that 
the variation in codon usage among genes may be due to a muta¬ 
tional bias at the DNA level. It also was observed that the posi¬ 
tion of genes on the axis 2 and axis 3 were significantly corre¬ 
lated with CAI (r=-0.348, /><0.01 and r=0.602, /><0.01), mean¬ 
time the ENc index was significantly correlated (r=-0.396, 
PO.Ol) with position of genes on the third axis (axis 3). Those 
genes with positive coordinates on the third axis have a more 
biased usage of codons compared to genes with negative axis 3 
coordinates. Both these observations suggest that nucleotide 
mutational bias plays a crucial role, selection in nature for trans¬ 
lational efficiency only in a minor way, in shaping codon usage 


in the chloroplast genome of P. alba. 


Table 2. Correlation between the codon usage and amino acid usage 
indices 


No. of 

Axis 

CAI 

ENc 

gc 3S 

GC 

Gravy 

Aromaticity 

Axisl 

0.024 

0.227 

0.349 

0.281 

-0.232 

-0.206 

Axis2 

-0.348 

0.207 

0.024 

-0.146 

-0.222 

-0.198 

Axis3 

0.602 

-0.396 

-0.241 

0.161 

-0.221 

0.194 

Axis4 

0.266 

0.113 

0.130 

0.058 

0.159 

0.041 


Notes: Those values that occur significantly (p<0.01) are marked in bold. 


In the present study, we further investigated the relationship 
between nucleotide content and codon usage by ENc-plot. 
Wright (1990) suggested the ENc-plot (ENc plotted against GC 3S ) 
as part of a general strategy to investigate patterns of synony¬ 
mous codon usage. Genes, whose codon choice is constrained 
only by a G+ C mutation bias, will he on or just below the curve 
of the predicted values (Wright 1990). The ENc-plot of genes 
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from P. alba is presented in Fig. 2. It can be seen that majority of 
the points tracks the reference line quite closely (Fig. 2). This 
indicates that the observed codon bias is most easily explained as 
a product of G+C mutation bias. But several points with low 
EN C values lay below the expected curve, including psbA gene 
which has been known that selection is acting on the codon use 
of this gene to adapt codons to tRNA availability in plant 
chloroplast genomes (Morton 1993; Pfitzinger, et al. 1987). This 
is probably due to the fact that codon usage is still dominated by 
a mutational bias in P. alba chloroplast genes and that selection 
appears to be limited to a subset of genes and to only subtly af¬ 
fect codon usage. 



GC3s 


Fig. 2 Effective number of codons (EN C ) used in each gene plotted 
against GC content at synonymously variable third positions of 
codons (GC 3S ). The continuous curve plots the relationship between ENc 
and GC 3S in the absence of selection. The psbA gene is indicated by a 
triangle 

Discussion 

DNA feature (base composition) is the most frequently reported 
and is probably one of the most pervasive influences on codon 
usage. Base composition is a balance between mutational pres¬ 
sure towards or away from G+C nucleotide pairs (Sueoka 1962). 
The overall genomic G+C content of P. alba chloroplast genome 
is estimated to be 36%; the mean frequency of G+C at synony¬ 
mously variable third codon position (GC 3S ) is 25.5%, somewhat 
lower than the estimated values for the genome. Codon usage by 
P. alba chloroplast genes is biased toward a high representation 
of NNU and NNT codons, similar to what have been found in 
other plant chloroplast genome (Wolfe et al. 1988; Morton 1993). 
The origin of such compositional constraints (GC/AT pressures) 
is still a matter of debate. Either these compositional constraints 
are the results of mutational biases (Sueoka 1988; Wolfe et al. 
1989), or natural selection plays the major role leading to pref¬ 
erential fixation of non-random dinucleotide and base frequen¬ 
cies (Bemardi 1993; Bemardi et al. 1986; Nussinov 1984). 
Wolfe et al. (1992) noted that chloroplast gene codon usage ap¬ 
pears to reflect a mutational bias rather than selection. This is 
also suggested by the correlation between genome AT content 
and overall codon bias (Morton 1993). In the plants, Pinus thun- 
bergii and the flowering plants appear to have very weak natural 
selection on codon usage of chloroplast genome such that only a 
few genes, in particular rbcL and psbA , show any evidence for 
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selection, but there is evidence that an intermediate level of se¬ 
lection exists in the liverwort Marchantia polymorpha (Morton 
1998). 

In the present study, correspondence analysis and ENc-plot are 
used to investigate patterns of synonymous codon usage of P. 
alba chloroplast genome. The demonstration of a strong correla¬ 
tion between the frequency of use of G+C in synonymously 
variable third positions of sense codon (GC 3S ) and codon usage 
suggested that the variation in codon usage among genes might 
be due to a mutational bias at the DNA level rather than natural 
selection acting at the level of mRNA translation. Although 
codon usage of chloroplast genes appears to be a result of a mu¬ 
tational bias toward a high AT content, the data presented here 
also suggest that there is some selection acting on codon usage of 
a few P. alba chloroplast genes. The evolution of codon bias over 
all of the chloroplast lineages is a complex matter. Several fac¬ 
tors are likely to be involved in determining the selective con¬ 
straints on codon bias (Morton 1998), and recent work has indi¬ 
cated that it is a dynamic process (Morton et al. 1997). The 
variation in selective constraints among the different lineages 
also makes it likely that substitution dynamics are substantially 
different in different lineages which might be related to the de¬ 
bate concerning how composition bias influences the phyloge¬ 
netic reconstruction of chloroplast origins (Lockhart et al. 1992). 
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